Antipattern: Using Indexes Without a Plan
Let's see how using indexes inappropriately affects a query.
We'll cover the following
When we choose our indexes by guessing, we inevitably make some wrong choices. Misunderstandings about when to use indexes lead to mistakes in one of these three categories:
-
Defining no indexes or not enough indexes
-
Defining too many indexes, or indexes that don’t help
-
Running queries that no index can help
No indexes#
We commonly read that a database incurs overhead costs as it keeps an index up-to-date. Each time we use INSERT
, UPDATE
, or DELETE
, the database has to update the index data structures for that table to be consistent so that our subsequent searches are able to use these indexes to find the right set of rows reliably.
We’re trained to think that overheads mean waste. Thus, when we read that databases incur overheads to keep their indexes updated, we instinctively want to eliminate those overheads. Some developers conclude that the remedy is to eliminate the indexes themselves. This advice is common, but it ignores the fact that indexes have benefits that justify the overhead cost they cause.
Not all overhead is waste. Don’t our companies employ administrative staff, legal professionals, and accountants? Don’t they pay for facilities even though these expenses don’t directly contribute to generating revenue? Of course, they do, and it is because these expenses contribute to the success of companies in significant ways.
In a typical application, we’ll run hundreds of queries against a table to update it for everyone. Every time we run a query that uses an index, we win back the overhead that went into maintaining that index.
An index can also help an UPDATE
or DELETE
statement by finding the desired rows quickly. For example, the index on the bug_id
primary key helps the following statement:
Run the query in the following playground to see how the data is updated.
A statement that searches an unindexed column performs a full table scan to find a match.
Too many indexes#
We benefit from an index only if we run queries that use that index. There’s no benefit to creating indexes that we won’t use. Here are some examples:
Indexes aren’t standard
In the previous example, there are several useless indexes:
-
bug_id
: Most databases create an index automatically for a primary key, so it’s redundant to define another index for this. This means that such an index would only cause extra overhead. Each database brand has its own rules for when to create an index automatically, so we need to read the documentation for the database that we use. -
summary
: An index for a long string data type likeVARCHAR(80)
is larger than an index for a more compact data type. We’re also not likely to run queries that search or sort by the entiresummary
column. -
hours
: This is another example of a column that we’re probably not going to need to search specific values from. -
bug_id, date_reported, status
: There are good reasons to use compound indexes, but many people create compound indexes that are redundant or seldom used. Besides avoiding this, it is also important to remember that the correct ordering of columns in a compound index is essential; we should use the columns in a left-to-right order in search criteria, join criteria, or sorting order.
Some people create indexes on every column and combination of columns because they don’t know which indexes will benefit their queries. However, this overlooks the fact that when they cover a database table with indexes, they incur a lot of overhead with no assurance of payoff.
When no index can help#
The next type of mistake is to run a query that can’t use any index. Developers create more and more indexes, trying to find some magical combination of columns or index options to make their query run faster, but it is all in vain.
We can think of a database index using the analogy of a telephone book. If I ask you were asked to look up everyone in a telephone book whose last name was Charles, it would be an easy task, since all the people with the same last name are listed together in a telephone book.
However, if we were asked to look up everyone in the telephone book whose first name was Charles, then the order of the names in the book wouldn’t benefit us. Anyone could have that first name, regardless of their last name, so we would have to search through the entire book line by line.
The telephone book is ordered by the last name and then by the first name, just like a compound database index on last_name
, first_name
. This index doesn’t help us search by the first name.
Some examples of queries that can’t benefit from this index include the following:
SELECT * FROM Accounts ORDER BY first_name, last_name;
This query shows the telephone book scenario. If we create a compound index for the two columns, last_name
followed by first_name
(as in a telephone book), the index doesn’t help us sort primarily by first_name
.
SELECT * FROM Bugs WHERE MONTH(date_reported) = 4;
We could create an index for the date_reported
column, but even then the order of the index doesn’t help us search by month. The order of this index is based on the entire date, starting with the year. But each year contains the fourth month, so the rows where the month is equal to 4 are scattered throughout the index.
Some databases support indexes on expressions or indexes on generated columns, as well as indexes on plain columns. But we have to define an index prior to using it, and the index only helps for the expression that we specify in its definition.
SELECT * FROM Bugs WHERE last_name = 'Charles' OR first_name = 'Charles';
Thus, we’re back to the problem in the example above: the rows that contain our specified first name are scattered unpredictably according to the order of the index that we have defined. The result of the previous query is the same as the result of the following:
SELECT * FROM Bugs WHERE last_name = 'Charles'
UNION
SELECT * FROM Bugs WHERE first_name = 'Charles';
The index in our example only helps us find our specified last name; it doesn’t help us find our specified first name.
Low-selectivity indexes
SELECT * FROM Bugs WHERE description LIKE ‘%crash%’;
Because the pattern in this search predicate could occur anywhere in the string, the sorted index data structure is unable to help us.